Google Cloud Storage (GCS)
Question: we have so many storage, why we still need GCS (Google Cloud Storage)?
Answer: to store unstructured data (file, image, video, etc.) and implement CDN. BigTable, BigQuery, ElasticSearch are all for structured data.
Definition: A Powerful, Simple and Cost Effective Object Storage Service.
Think it as Google Drive, Dropbox, Amazon S3.
Features
Durable
- Google Cloud Storage is designed for 99.999999999% durability. It stores data redundantly, with automatic checksums to ensure data integrity. With Multi-Regional storage, your data is maintained in geographically distinct locations.
Available
- All storage classes offer very high availability around the world.
Scalable
Inexpensive
- Good to store unstructured data like image, file, video, etc.
Content Delivery Network (CDN)
A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers.
CDN, offers an efficient, cost-effective way of reducing both network I/O costs and content delivery latency for regularly accessed website assets.
A CDN can be understood as group of geographically distributed caches, with each cache locaœted in one of several global points of presence.
Traditional server vs. CDN
Companies like Akamai, Cloudflare, etc.
GCS Bucket
First of all, we will need a gcs bucket. Why we need this bucket? It’s like a folder which we can have files (images/videos posted from user) in it. Bucket == Folder.
Open your console.cloud.google.com and choose Storage -> Browser.
Then click ‘CREATE BUCKET
’ to create a new bucket.
Pick a name for it, starts with post-images-
, and the suffix is your project ID to avoid any conflict with other users. If used by others, add a random number. For example, I use post-images-75015
while yours will be different. Please remember this bucket name and we will use it later.
When asked about ‘Default storage class
’, Regional is ok as our servers (GAE flex) are in us-central-, so we may put this bucket in us-central- too. Click ‘save
’ to save your changes.
Multi part Form
A HTTP multipart request is a HTTP request that HTTP clients construct to send files and data over to a HTTP Server.
It is commonly used by browsers and HTTP clients to upload files to the server.
The content type “application/x-www-form-urlencoded
“ is inefficient for sending large quantities of binary data or text containing non-ASCII characters.
The content type “multipart/form-data
“ should be used for submitting forms that contain files, non-ASCII data, and binary data.
How to send multipart request in Postman?
How does it look like?
How to parse multipart form in Go?
1 | func (r *Request) ParseMultipartForm(maxMemory int64) error |
https://golang.org/pkg/net/http/#Request.ParseMultipartForm
https://github.com/golang-samples/http/blob/master/fileupload/main.go
1 |
|
Code changes
Question: Why we need GCS in this project?
Answer: to save the image uploaded from user and serve it.
Update handlerPost to support save image into GCS
1 | import ( |
Implement saveToGCS.
Google has provided a good example of writing objects to GCS. https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-go
Google example of open a client connection to GCS
1 | package main |
Google example of writing an object to GCS (copied from https://github.com/GoogleCloudPlatform/golang-samples/blob/master/storage/objects/main.go
)
1 | func write(client *storage.Client, bucket, object string) error { |
Answer:
1 | // Save an image to GCS. |
Remember to install package:
go get -u cloud.google.com/go/storage
这里遇到了迷之bug
error: elastic: Error 400 (Bad Request): failed to parse [type=mapper_parsing_exception]
问题出在postman上面, 重新开个post或者试试把url的类型改成plain text
Test
Local test (on your own computer)
cd to the folder where you have main.go
1 | go run main.go |
Open your Postman, change the method to POST, in the url enter ‘http://localhost:8080/post’
In the Body part, choose form-data, and then enter lat, lon, message and image. For image, the type is ‘file’ such that you may upload an image from your local storage.
Verify the search API is working
Open Postman, change the method to GET, in the url change it to ‘http://localhost:8080/search?lat=37.5&lon=-120.5&range=200’
Copy the url into your browser and download the image to verify that it’s the same image that you’ve downloaded.
If you have any auth issue, try
1 | gcloud auth application-default login |
If you have permission problem, try
1 | sudo chown -R $(whoami):staff ~/.config/gcloud/ |
Integration test (TBD)
Remote test (homework)
Commit your changes to github and then open a new cloud terminal.
Check out from github
1 | git pull origin master |
cd to your folder with main.go and app.yaml
1 | gcloud app deploy |
Wait until the deployment is done
Open Postman.
Instead of using ‘http://localhost:8080’ change it to your service url (without port) For example, ‘http://around-179500.appspot.com/post’ and ‘http://around-179500.appspot.com/search?lat=37.5&lon=-120.5&range=200’
The others are the same
Cors
cross-origin sharing standard
Pre-flight requests: You may have a content type like JSON, or some other custom header that’s triggering a pre-flight request, which your server may not be handling.
Try adding this one, if you’re using the ever-common AJAX in your front-end: https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#Requested-With
Gorilla’s handlers.CORS()
will set sane defaults to get the basics of CORS working for you;
however, you can (and maybe should) take control in a more functional manner.
https://stackoverflow.com/questions/40985920/making-golang-gorilla-cors-handler-work
(Optional) Vim go plugin:
https://github.com/fatih/vim-go
Spam and Abuse Detection
Any user-facing IT company has to address the Spam and Abuse problem in production.
Question: Why our project needs to think about spam problem?
Answer:
Categories of Spam and Abuse in Industry
Racy/Nudity (Child Porn especially)
- Harassment or Hate Speech (n* word, etc.)
- Fake News (Facebook)
- Keyword Spam (Keyword stuffing)
- Drug Abuse
- Violence or Bully
- Spam
- Copycat (IP Infringement)
- Phishing (CEO Phishing for SSN)
- Privacy Leak
Solutions
- Keyword based (rule based)
- n words, s words, f* words, etc.
- Machine Learning Model (images and videos)
- Open NSFW model
- Geo location based
- Strip clubs, etc.
- User based
- User ban
- User warning
- Aggregated user signals
- User feedback
- report
- Machine Learning
- Moderators
- External channels
- Internal users like employees
- Media coverage
- Others
Question:
In our current design, we kind of spam filters we may enforce?
Answer: For example, keyword based spam filter